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Abstract 

Black-box complexity studies lower bounds for the efficiency of general-purpose 
black-box optimization algorithms such as evolutionary algorithms and other search 
heuristics. Different models exist, each one being designed to analyze a different as¬ 
pect of typical heuristics such as the memory size or the variation operators in use. 
While most of the previous works focus on one particular such aspect, we consider in 
this work how the combination of several algorithmic restrictions influence the black¬ 
box complexity. Our testbed are so-called OneMax functions, a classical set of test 
functions that is intimately related to classic coin-weighing problems and to the board 
game Mastermind. 

We analyze in particular the combined memory-restricted ranking-based black¬ 
box complexity of OneMax for different memory sizes. While its isolated memory- 
restricted as well as its ranking-based black-box complexity for bit strings of length n 
is only of order n/logn, the combined model does not allow for algorithms being faster 
than linear in n, as can be seen by standard information-theoretic considerations. We 
show that this linear bound is indeed asymptotically tight. Similar results are ob¬ 
tained for other memory- and offspring-sizes. Our results also apply to the (Monte 
Carlo) complexity of OneMax in the recently introduced elitist model, in which only 
the best-so-far solution can be kept in the memory. Finally, we also provide improved 
lower bounds for the complexity of OneMax in the regarded models. 

Our result enlivens the quest for natural evolutionary algorithms optimizing One- 
Max in o(n log n) iterations. 


1 Introduction 

Black-box complexity aims at analyzing the influence of algorithmic choices such as the 
population size, the variation operators in use, or the selection principles on the optimiza- 
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tion time of evolutionary algorithms (EAs) and other (deterministic or randomized) search 
heuristics. Lower bounds from black-box complexity theory provide information about the 
limits of certain classes of evolutionary algorithms (e.g., memory-restricted, ranking-based, 
or unbiased EAs), while upper bounds can serve as an inspiration for the development of 
new EAs. 

Unlike other complexity notions, black-box complexity is a measure for the number of 
black-box queries that an algorithm does in order to optimize an unknown function /. That 
is, we simply count the number of function evaluations f{x) that are needed (usually in 
expectation, but cf. Section 2.1 below) until for the hrst time an optimal search point x G 
arg max / is evaluated. Black-box complexity typically disregards all computational efforts 
that an algorithm executes between any two different such function evaluations. In “classical” 
theoretical computer science (TCS) black-box complexity is often referred to as (randomized) 
query complexity. While being a well-known complexity notion in there, the focus in the 
broader TCS community is typically on having a simplified complexity measure for sorting, 
coin-weighing, and other problems, and not, as is the case in evolutionary computation, on 
analyzing the impact of above-mentioned algorithmic choices on the performance of general- 
purpose problem solvers. 

1.1 Related Work on Black-Box Complexity 

In the context of evolutionary computation (EC), black-box complexity has hrst been studied 
by Droste, Jansen, (Tinnefeld,) and Wegener in [DJTW03] and [DJW06]. The authors 
regard two different black-box models, an unrestricted version, in which the algorithms have 
arbitrary memory and full access to function values, and a memory-restricted one, in which 
the algorithms are allowed to store only a limited number of previously queried search points 
and their function values. While the unrestricted model is mostly used for analyzing lower 
bounds, the memory-restricted model is studied in the context of upper bounds. Since most 
EAs have a limited population size, these upper bounds typically provide a better comparison 
for the efficiency of different algorithmic approaches. 

The theory seemed to have come to an early end afterward since even the memory- 
restricted version yielded black-box complexities that were unreasonably low compared to 
the performance of evolutionary algorithms. Thus, the notion appeared to be of little use 
for the understanding of such algorithms. However, the held experienced a major revival 
with the work of Lehre and Witt [LW10,LW12] on the unbiased black-box model. In this 
version, new search points can be obtained by the algorithm only by sampling uniformly at 
random from the underlying search space, or (for the search space being the n-dimensional 
hypercube {0,1}"') by combining previously queried search points in a way that does not 
discriminate between the bit positions 1, 2,..., n nor between the bit values 0 and 1. Many 
EAs use variation operators of this unbiased type. 

Lehre and Witt could show that their unbiased black-box complexity notion can give 
much better estimates for the efficiency of typical EAs than the previous models. This also 
applies to the so-called OneMax problem, whose unrestricted black-box complexity is only 
of order n/logn [ER63, DJW06, AW09] while its unary unbiased black-box complexity is of 
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order nlogn [LW12, Theorem 6], thus matching the expected optimization time of search 
heuristics such as the so-called (1-1-1) EA and Randomized Local Search. The (generalized) 
OneMax problem is to identify an unknown bit string z if with each each query x the 
algorithm learns the number Om^(x) := |{z G {1,2,...,?7,} | Xi = Zi}\ of bit positions in 
which X and 2 ; agree (in other words, OM^(a;) equals n minus the Hamming distance of x 
and z). This problem can be seen as a generalization of the popular Mastermind game with 
two colors (cf. [DW14a]), and is one of the easiest pseudo-Boolean optimization problems as 
it only requires trap-free hill-climbing. As such it is typically one of the hrst test problems 
that is regarded when introducing a new black-box model. 

It was left as an open question in [DJW06] whether or not restricting the memory of 
an algorithm already yields a similar runtime bound of f2(nlogn) for the optimization of 
OneMax. This hope was dashed in [DW14a], where it has been shown that even for the 
smallest possible memory size, in which algorithms may store only one previously queried 
search point and its htness, an 0(n/logn) algorithm exists. Similarly, in the ranking-based 
black-box model, in which the algorithms learn only the ranking of the function values, but 
not their absolute values, OneMax can still be solved in an expected number of 0{n/\ogn) 
function evaluations [DW14b]. 

1.2 Our Results 

While previous work in black-box complexity theory focused on analyzing the influence 
of single restrictions on the efficiency of the algorithms under consideration, we regard in 
this work combinations of such algorithmic constraints. As testbed, we regard the above- 
mentioned class of OneMax functions. Since for this problem many, often provably tight, 
bounds are available for the single-restriction models, we can easily compare our results to 
see how the combined restrictions impact the best-possible optimization times, cf. Table 1 
below. 

In a hrst step, we study the combined memory-restricted ranking-based model, i.e., we 
study the black-box complexity of OneMax with respect to (p -I- A) memory-restricted 
ranking-based algorithms. Algorithms htting this framework are allowed to store up to p 
previously queried search points and their ranking with respect to the underlying objective 
function /. (Solely) from this information, the algorithms then generate and query A new 
search points (so-called offsprings). They receive information of how these newly generated 
search points perform with respect to the parent population (more precisely, the full ranking 
of the search points with respect to / is revealed to the algorithms), and the algorithms 
then select an arbitrary subset of p of these search points, which form the parent population 
of the next iteration. This process continues until a search point x G argmax/ is queried 
for the hrst time. 

For the most restrictive case p = A = 1 (i.e., the often regarded (1-1-1) scheme), the 
algorithms under consideration are easily seen to be comparison-based, i.e., they learn with 
each query only whether the offspring has better, equal, or worse htness than its parent. 
Therefore, by a simple information-theoretic argument (cf., e.g., [DJW06, Theorem 2]), their 
expected optimization time on OneMax is at least linear in n. This already shows that 
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the combined (1+1) memory-restricted ranking-based black-box complexity of OneMax 
is asymptotically larger than either the pure ranking-based or the pure memory-restricted 
version. However, this is not the end of the story. In this work we show lower bounds 
for the combined (1+1) model that are by a constant factor stronger than the best known 
bounds for comparison-based algorithms. Thus they are stronger than any bound obtained 
by reducing the combined model to an existing black-box model with a single restriction. On 
the other hand, we show that the mentioned linear lower bound is asymptotically tight. That 
is, we provide a linear time algorithm solving OneMax in a (1+1) scheme and using only 
relative htness information. Also for many other combinations of p and A we show that the 
information-theoretic lower bound is matched by a (/i + A) memory-restricted ranking-based 
algorithm. 

In a next step, we combine the memory-restricted ranking-based model with yet another 
restriction, namely with the recently introduced elitist selection requirement introduced in 
the work [DLlSa] on elitist black-box models. In this context, we additionally require that 
the algorithm selects the fi fittest individuals out of the // + A parents and offspring (where 
it may break ties arbitrarily).^ Notably, the achievable optimization times stays the same 
(asymptotically), though in a slightly different sense as we shall discuss below. This is rather 
surprising, as all previous black-box optimal algorithms make substantial use of non-elitist 
selection. 

Table 1, taken from [DD14] and extended to cover the results of the present paper, 
summarizes known lower and upper bounds of the complexity of OneMax in the different 
black-box models. Bounds given without reference follow trivially from identical bounds 
in stronger models, e.g., the kl{n/ \ogn) lower bound for the memory-restricted black-box 
complexity follows directly from the same bound for the unrestricted model. 

A short version of this work has been presented at the Genetic and Evolutionary Com¬ 
putation Conference (GECCO 2015) in Madrid, Spain [DL15b]. 

1.3 Relevance of Our Work and Techniques 

While at a first glance the obtained upper bounds may seem to be a shortcoming of the 
model (most EAs need r2(nlogn) steps to optimize ONEMAX-functions), it does not have 
to be. In light of [DDE15], where a simple and natural EA has been designed that optimizes 
OneMax in o{n logu) time, it is well possible that such a result can be extended further (of 
particular interest is an extension to (l+l)-type algorithms). As we know from [DDE15], 
black-box complexity results like our mentioned OneMax bound can give an inspiration for 
developing such algorithms. 

One obvious challenge for designing algorithms in the combined memory-restricted 
ranking-based model is the fact that the best-known algorithms in the single-restriction 
case either make heavy use of knowing the absolute fitness values (in the memory-restricted 

^As mentioned in [DLlSa] we remark that the usage of “elitist selection” is not standardized in the EA 
literature. Some subcommunities would therefore rather call our elitist black-box model a black-box model 
with truncation selection. 
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Model 

Lower Bound 

Upp 

er Bound 

unrestricted 

Q(n/log n) 

info.-theo. 

0 (n/log n) 

[ER63,AW09] 

unbiased, arity 1 

Q{n log n) 

[LW12] 

0(n log n) 


unbiased, arity 2 < k < logn 

Q(n/log n) 



0{n/k) 

[DW14c,DDE15] 

r.b. unrestricted 

Q{n/\og n) 



0 (n/log n) 

[DW14b] 

r.b. unbiased, arity 1 

Q{n log n) 



0(n log n) 


r.b. unbiased, arity 2 < k < n 

n(n/log n) 



0{n/ log k) 

[DW14b] 

( 1 + 1 ) comparison-based 

Q(n.) 

info.-theo. 

0{n) 

( 1 + 1 ) memory-restricted 

Q(n/log n) 

Oin/ logn) 

[DW14a] 

(1+1) elitist Las Vegas 

$l(n) 

1 


0{n log n) 


(1+1) elitist log n/n-Monte Carlo 

Q(n.) 



0{n) 

Thm. 8 

(2+1) elitist Monte Carlo/Las Vegas 

Q(n) 


iThm. 6 

0{n) 

Thm. 7 

(1+A) elitist Monte Carlo generations) 

Q(n/log A) 



0(n/log A) 

Thm. 15 

(/i+1) elitist Monte Carlo 

O(n/log 11 ) 

J 


0 (n/log y) 

Thm. 16 

(1,A) elitist Monte Carlo/Las Vegas ( 7 ^ generations) 

Q,(n/\og A) 

cf. Section 10 

0(n/log A) 

Thm. 18 


Table 1: The black-box complexities of OneMax in the different models, r.b. abbreviates 
ranking-based; info-theo. the information-theoretic bound [Yao77], cf. also [DJW06]; for 
(1 -|- A) and (1, A) we assume 1 < A < 2” ^ for some e > 0, and for {fi + 1) we assume that 
p = a;(log^ n/log logn) and jj, <n. 


case, see [DW14a]) or of having access to a large number of previously queried search points 
(in the ranking-based case, cf. [DW14b]). It is thus not obvious how to design efficient 
algorithms respecting both restrictions at the same time. Our results therefore require ap¬ 
proaches and strategies that are significantly different from those found in previous works, 
though, at the other hand, we can and also do make significant use of several ideas developed 
in previous works on OneMax in the different black-box models. For example, for the (1+1) 
memory-restricted ranking-based elitist black-box model the algorithm certifying the linear 
upper bound nicely combines previous techniques from the black-box complexity literature 
with some newly developed tools such as the neutral counters designed in Section 7.2.3. We 
believe that the insights from these tools will be useful in future research in evolutionary 
computation, both in algorithm analysis and in algorithm design. 

For the lower bounds, a technical difficulty that we face in the proofs is a putative non- 
applicability of Yao’s Principle. More precisely, there may be randomized algorithms that 
even in the worst case perform much better than any deterministic algorithm on a random 
problem instance, cf. Section 4.2 and [DLlSa]. We overcome these problems by expanding 
the class of algorithms regarded. This needs some care as we do not want to decrease the 
complexity too much by this expansion. 

1.4 Structure of the Paper 

Our paper is structured as follows. We start with a formal introduction of the models in 
Section 2, followed by a brief discussion on the difference between Las Vegas and Monte 
Carlo complexities, which can be crucially different in memory-restricted models. In a nut¬ 
shell, the Las Vegas complexity measures the expected time until an optimal search point 
is hit, while the p-Monte Carlo complexity asks for the time needed until an optimum is 
hit with probability at least 1 — p. These bounds can be exponentially far apart as shown 
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in [DL15a] and thus need to be regarded separately. In Section 3 we formally introduce 
the generalized OneMax functions and recapitulate the known bounds on its complexity in 
different black box models. We conclude the introductory sections by providing some basic 
tools in Section 4. 

In Section 5 we provide the mentioned lower bounds for the (p + A) memory-restricted 
ranking-based black-box complexity of OneMax for a wide range of /i and A. For the upper 
bounds, most of our proofs work directly in the elitist black box model, so the remainder of 
the paper is devoted to the proofs of such upper bounds in the elitist model, which imply the 
same upper bounds for the memory-restricted ranking-based model. We hrst give a simple 
linear upper bound for the (2-|-l) (Las Vegas and Monte Carlo) elitist black-box complexity of 
OneMax (Section 6). At the heart of this paper is Section 7, where we show the linear upper 
bound for the (1+1) Monte Carlo elitist black-box complexity of OneMax. In Sections 8 
and 9, we consider more generally (1 + A) and (p + 1) elitist black-box algorithms. Finally, 
in Section 10 we give some remarks on the (p. A) elitist black-box complexities of OneMax 
and point out some important differences from the (p + A) complexities. 


2 Black-Box Models and Complexity Measures 

We are primarily interested in analyzing the memory-restricted ranking-based black-box 
complexities of OneMax. An important difference to purely memory-restricted algorithms 
is that the available memory is strictly smaller in this combined memory-restricted and 
ranking-based model. If we regard, for example, the (1+1) case, then in the purely memory- 
restricted model the algorithm does not only have access to the current search point, but 
also to its htness value. It thus has strictly more than n bits of information when sampling 
the offspring. If, on the other hand, the algorithm is in addition also ranking-based, then it 
may not access the htness; thus its available information is restricted to exactly n bits. So 
the htness-based variant has effectively a larger available memory than the ranking-based 
one (but of course both are not completely free in how to use the memory). 

Formally, a (/i + A) memory-restricted, ranking-based black-box algorithm maintains a 
population (parent generation) of p search points, and knows the ranking of their htnesses. 
Based solely on this information it samples A additional search points (offsprings), and 
receives the ranking of all /i + A htnesses. From the parent generation and the ohsprings, 
it selects /i search points to form the new parent generation. A (p + A) memory-restricted, 
ranking-based black-box algorithm is elitist if in the selection step it selects the p best search 
points with respect to the ranking. The algorithm may break ties arbitrarily: for example, 
if all /i + A search points have the same htness, then it may choose an arbitrary subset of 
size fi to form the next parent generation. The formal structure of a (/i + A) elitist black-box 
algorithm is given by Algorithm 1. 

Note that the only diherence to the (/i + A) memory-restricted ranking-based black-box 
model is the enforced elitist selection in line 9, which in the former model can be replaced 
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Algorithm 1: The (/i + A) elitist black-box algorithm for maximizing an unknown 
function / : {0,1}” —>■ M 


1 

2 

3 

4 


5 


Initialization: 
for i = 1,..., /i do 

Depending only on the multiset X and the ranking p{X, f) of X induced by /, 
choose a probability distribution over {0,1}” and sample according to 

ph). 

AU{x«}; 


6 

7 


8 

9 


Optimization: for f = 1, 2, 3,... do 

Depending only on the multiset X and the ranking p{X, f) of X induced by / 
choose a probability distribution on ({0,1}®®)^^]^ and sample {y^^\ ... 
according to ; 

Set A^AU{pW,...,pW}; 

for i = 1,..., A do Select x G arg min A and update A A \ {x} 


by 


for i = 1,..., A do Select x G A and update A A \ {x}; 

Since the elitist model is more restrictive than the combined memory-restricted ranking- 
based one, every upper bound on the (p -|- A) elitist black-box complexity also holds for 
the (p -|- A) memory-restricted ranking-based black-box complexity. As discussed in [DLlSa] 
several variants of the elitist model exist, but this is beyond the scope of the present paper. 

Following the standard convention for black-box optimization, we dehne the runtime (or 
optimization time) of a (p -|- A) black-box algorithm A to be the number of search points 
sampled by A until an optimal search point is sampled for the hrst time (samples are counted 
with multiplicities if they are sampled several times). Since a (p -|- A) algorithm samples A 
search points in each generation, the runtime of an algorithm after t generations is p -|- At, 
but see also our comment at the end of Section 2.1. 

2.1 Las Vegas vs. Monte Carlo Complexities 

Elitist black-box algorithms cannot do simple restarts since a solution intended for a restart is 
not allowed to be accepted into the population if its htness is not as good as those of the search 
points currently in the memory. Regarding expected runtimes can therefore be signihcantly 
different from regarding algorithms with allowed positive failure probability. In fact, it is 
not difficult to see that these two notions can be exponentially far apart [DLlSa, Theorem 
3]. One may argue that this is a rather artificial problem since in practice there is no 
reason why one would not want to allow restarts. Also, almost all algorithms used to show 
upper bounds in the previous black-box models have small complexity only because of the 
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possibility of doing random restarts. One convenient way around this problem is to allow for 
small probabilities of failure. Such (high) probability statements are actually often found in 
the evolutionary computation literature. The following dehnition captures its spirit. 

Let us regard for a black-box algorithm A the smallest number T of function evaluations 
that is needed such that for any problem instance f E the optimum of / is found with 
probability at least 1 — p. We call T = T{A,J^) the p-Monte Carlo black-box complexity 
of A on The p-Monte Carlo black-box complexity of iF with respect to a class A of 
algorithms is F). If we make a statement about the Monte Carlo complexity 

without specifying p, then we mean that for every constant p > 0 the statement holds for the 
p-Monte Carlo complexity. However, we sometimes also regard p-Monte Carlo complexities 
for non-constant p = p(n) = o(l), thus yielding high probability statements. 

The standard black-box complexity (which regards the maximal expected time that an al¬ 
gorithm A needs to optimize any f E F) is called Las Vegas black-box complexity in [DLlSa]. 
We adopt this notation. 

We recall from [DLlSa] that, by Markov’s inequality, every Las Vegas algorithm is also 
(up to a factor of 1/p in the runtime) a p-Monte Carlo algorithm. We also repeat the 
following statement which is a convenient tool to bound p-Monte Carlo complexities. 

Remark 1 (Remark 1 in [DLlSa]). Let p E (0,1). Assume that there is an event 8 of 
probability ps < P such that conditioned on -<8 the algorithm A finds the optimum after 
expected time at most T. Then the p-Monte Carlo complexity of A on f is at most (p — 
ps)~^T. In particular, if p — ps = fl(l) then the p-Monte Carlo complexity is 0(T). 

For some applications it is more natural to count the number of generations rather than 
the number of sampled search points (e.g., because the evaluations of different search points 
may be parallelizable). For this reason, we give some complexities also for the number of 
generations, cf. Table 1. All dehnitions above transfer analogously, with the runtime of 
an algorithm replaced by the number of generations needed before an optimal search point 
is sampled for the hrst time. However, note that all black-box complexities refer to the 
expected runtime unless explicitly stated otherwise. 

3 Background on OneMax Complexities and Overview 
of Results 

One of the most prominent problems in the theory of randomized search heuristics is the 
running time of evolutionary algorithms and other heuristics on the OneMax problem. 
OneMax is the function that counts the number of ones in a bitstring. Maximizing One- 
Max thus corresponds to hnding the all-ones string. 

Search heuristics are typically invariant with respect to the problem encoding, and as 
such they have the same runtime for any function from the generalized OneMax function 
class 


OneMax := {Om^ | 2 ; e {0,1}”} , 


where Om^ is the function 


Om^ : {0,1}"' —)■ M, a; 1 -^ n — {xi © Zi), (1) 

i=l 

assigning to x the number of positions in which x and z agree. We call z, the unique global 
optimum of function Om^, the target string of Om^. Whenever we speak of the OneMax 
problem or a OneMax function we mean the whole class of OneMax functions or an 
unknown member of it, respectively. 

The OneMax problem is by far the most intensively studied problem in the runtime anal¬ 
ysis literature and, due to its close relation to the classic board game Mastermind [DW14a], 
to cryptographic applications, and to coin-weighing problems, it is also studied in other areas 
of theoretical computer science. Also for black-box complexities it is the most commonly 
found test problem. Without going too much into detail, we recall that the unrestricted 
black-box complexity of OneMax is 0(n/logn) [DJW06, AW09, ER63]. While the lower 
bound is a simple application of Yao’s Principle (Lemma 4, cf. [DJW06] for a detailed expla¬ 
nation of the r2(n/logn) lower bound), the upper bound is achieved by an extremely simple, 
yet elegant algorithm: sampling 0{n/\ogn) random search points and regarding their htness 
values, with high probability, reveals the target string z. We shall make use of (variants of) 
this strategy in some of our proofs of upper bounds. 

Another important bound for the OneMax problem is the simple 0(n) bound for 
comparison-based algorithms as introduced in [TG06].^ Since (1+1) memory-restricted 
ranking-based algorithms are comparison-based, this gives a linear lower bound for their 
complexity on OneMax. 

Remark 2. The (1+1) memory-restricted ranking-based black-box complexity of OneMax 
is Q{n), thus implying a linear lower bound for the (1+1) elitist Las Vegas and Monte Carlo 
black-box complexity of OneMax. 

If we consider the leading constants hidden in the fl-notation, then the lower bounds com¬ 
ing from the comparison-based complexity are not optimal. In Theorem 6 we will prove lower 
bounds for memory-restricted ranking-based algorithms that are by a non-trivial constant 
factor higher than the best known bounds for comparison-based algorithms. 

Our upper bounds will show that there are elitist black-box optimization algorithms 
optimizing OneMax much more efficiently than typical heuristics like RLS or evolutionary 
algorithms. In particular we show that the (1+1) elitist Monte Carlo black-box complexity 
is at most linear (which is best possible by Theorem 6). Our results are summarized in the 
lower part of Table 1. Note that the upper bounds for elitist algorithms immediately imply 
upper bounds for the (Monte Carlo and Las Vegas) black-box complexity of OneMax in 
the respective memory-restricted ranking-based models. The lower bounds also carry over in 

^The lower bound is again a simple application of Yao’s Principle (Lemma 4), while the upper bound is 
attained, for example, by the algorithm which checks one bit at a time, going through the bitstring from one 
end to the other. Alternatively, the upper bound is also verified by the (1 -f (A, A)) GA of [DDE15], thus 
showing that it can also be achieved by unbiased algorithms of arity two. 
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asymptotic terms (i.e., up to constant factors), cf. Theorem 6. Since the memory-restricted 
ranking-based bounds were the original motivation for our work, we collect them in the 
following statement. 

Corollary 3. The (1+1) memory-restricted ranking-based (Las Vegas) black-box complex¬ 
ity of OneMax is 0(n). For 1<A<2” ^,£>0 being an arbitrary constant, its 
(1-l-A) memory-restricted ranking-based black-box complexity is 0(n/logA) (in terms of gen¬ 
erations), while for p = a;(log^(n)/loglogn) its {p, + 1) memory-restricted ranking-based 
black-box complexity is 0(n/log/i).^ 

4 Tools 

In this section we list some tools that we need to study the (/i -|- A) memory-restricted 
ranking-based black-box and the (p -|- A) elitist black-box complexity. More precisely, we 
recapitulate the RLS algorithm, Yao’s principle, and a Negative Drift Theorem. 

4.1 Random Local Search 

A very simple heuristic optimizing OneMax in 0(nlogn) steps is Randomized Local Search 
(RLS). Since this heuristic will be important in later parts of this paper, we state it here 
for the sake of completeness. RLS, whose pseudo-code is given in Algorithm 2, is initialized 
with a uniform sample x. In each iteration one bit position j G [n] := {1, 2,..., n} is chosen 
uniformly at random. The j-th bit of x is flipped and the fitness of the resulting search 
point y is evaluated. The better of the two search points x and y is kept for future iterations 
(favoring the newly created individual in case of ties). As is easily verihed, RLS is a unary 
unbiased (1+1) elitist black-box algorithm, where we understand unbiasedness in the sense 
of Lehre and Witt [LW12]. 


Algorithm 2: Randomized Local Search for maximizing /: {0,1}” -+ M. 

1 Initialization: Sample x G {0,1}"^ uniformly at random and query /(x); 

2 Optimization: for f = 1, 2, 3,... do 


Choose j G [n] uniformly at random; 

Set y ^ X (B ef and query f{y) ; //mutation step 
if f{y) > f{x) then x ■(— ?/; //selection step 


^We do not consider in this work (/i -|- A) elitist algorithms for g and A both being strictly greater than 
one. We feel that the required tools are given in the (1 + A) and (/i -t 1) settings, so that analyzing the 
additional settings would not give sufficiently many new insights. 
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4.2 Yao’s Principle 

We will use the following formulation of Yao’s principle. See [DLlSa] for a more detailed 
exposition of Yao’s principle in the context of elitist black-box complexity. 

Lemma 4 (Yao’s Principle [Yao77, MR95]). Let U be a problem with a finite set X of input 
instances (of a fixed size) permitting a finite set A of deterministic algorithms. Let p be a 
probability distribution overX and q be a probability distribution over A. Then, 

minE[T(/p,7l)] < maxE[T{I, Ag)] , (2) 

where Ip denotes a random input chosen from X according to p, Ag a random algorithm 
chosen from C according to q and T{I, A) denotes the runtime of algorithm A on input I. 

For most problem classes Yao’s principle implies that the runtime T of a best-possible 
deterministic algorithms on a random input is a lower bound to the best-possible perfor¬ 
mance of a random algorithm on an arbitrary input. However, this is not true for (/i -|- A) 
memory-restricted or elitist algorithms, since there are randomized memory-restricted (or 
elitist) algorithms that are not convex combinations of deterministic ones (i.e., that can not 
be obtained by deciding randomly on one deterministic algorithm, and then running this 
algorithm on the input). 

For example, every deterministic (1+1) memory-restricted ranking-based algorithm that 
ever rejects a search point (i.e., does not go to the newly sampled search point) will be caught 
in an inhnite loop on OneMax with positive probability if the input is chosen uniformly at 
random. Hence, such an algorithm will have inhnite expected runtime. On the other hand, if 
the algorithm does not reject any search point, then it is easy to see that its expected runtime 
on OneMax is 0(2”). However, there are certainly (1+1) memory-restricted ranking-based 
randomized algorithms (e.g., RLS) that optimize OneMax in expected time 0{n\ogn). 
We refer the reader to [DLlSa] for a more detailed discussion. To solve this putative non- 
applicability of Yao’s Principle (cf. again [DLlSa] for a more detailed discussion), we apply 
it to a suitable superset of algorithms. In particular, Yao’s principle applies to every set of 
algorithm that have access to their whole search histories. 

4.3 Negative Drift 

We recall the Negative Drift Theorem as given in [OWll]. 

Theorem 5 (Negative Drift Theorem [OWll]). Let Xt, t > 0 be real-valued random variables 
describing a stochastic process over some state space, with filtration := (Yo,...,Y). 
Suppose there exists an interval [a, h] C M, two constants 5, e > 0 and, possibly depending 
on £ := b — a, a function r{£) satisfying 1 < r(£) = o(i/logi) such that for all t > 0 the 
following two conditions hold: 

1. E[Yf — Yt_|_i \ J-f A a < Xt < b] < —e, 
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Pr[|Xi - >j\J^,Aa<X,]<j^ for j E Nq. 

Then there is a constant c* > 0 such that for T* := min{t > 0 : < a | A Xq > b} it 

holds Ft[T* < = 2-^P/''W). 

5 Lower Bounds 

In this section we show that the (1+1) memory-restricted ranking-based black-box complex¬ 
ity of OneMax is at least Q{n). In fact, we show this bound for a large range of function 
classes. We also show (mostly tight, as the algorithms in subsequent sections will show) 
lower bounds for general (/i + A) elitist black-box algorithms. 

We use Yao’s Principle (Theorem 4 in Section 4.2). However, as outlined in Section 4.2, 
Yao’s Principle is not directly applicable to memory-restricted or elitist black-box algorithms. 
Still we can apply Yao’s Principle to a suitable superset of algorithms, yielding the following 
bounds. 

Theorem 6. Let X be a class of functions such that for every z G {0,1}” there is a function 
fzEX with unique optimum z. Then the (1+1) memory-restricted ranking-based black-box 
complexity of X (and thus, also the elitist (1+1) Las Vegas hlack-hox complexity) is at least 
n — 1. Moreover, for every p > 0 the p-Monte Carlo hlack-hox complexity of X is at least 
n+ |■log(l -p)]. 

In general, for every p > 1 and A > 1, the following statements are true for the memory- 
restricted ranking-based black box complexity, for the elitist Las Vegas black box complexity, 
and for the elitist Monte Carlo black box complexity. 

• The (1 + A) black-box complexity of X is at least n/log(A + 1) — 0(1)- 

• The (p + 1) black-box complexity of X is at least n/log(2p + 1) — 0(1)- 

• The (p + A) black-box complexity of X is at least n/{b + o(l)), where b = log((^^'^)) + 

p(logp — 1 — log In 2) — 1. 

Proof of Theorem 6.^ We hrst give the argument for the (1+1) case to elucidate the argu¬ 
ment, although this case is covered by the more general (1 + A) case. We use Yao’s Principle 
on the set A! of all algorithms A satisfying the following restrictions. H is a comparison- 
based (1+1) black-box algorithm that has access to the whole search history. (Thus we may 
apply Yao’s Principle, see Section 4.2.) The algorithm learns about / by oracle queries of 
the following form. It may choose a search point x that it has queried before (in the hrst 
round, it simply chooses a search point without querying), and a search point y. Then A may 
choose a subset S of “=”, “>”} and the oracle will return yes if the relation between 

^The extended abstract [DL15b] published at GECCO contains a proof that covers only the elitist case, 
but is more intuitive and less technical. We advice the reader who is only interested in the proof ideas to 
read that proof rather than the general version given here. 
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f{x) and f{y) is in S, and no otherwise. For example, if S' = “=”} then the oracle 

answers the question “Is f{x) < f{y)T\ 

Let A be the set of all (1+1) memory-restricted ranking-based black-box algorithms. 
We need to show A C A', so let A ^ A. When the current search point of A is x, the 
algorithm may hrst decide on the next search point y (i.e., it assigns to each search point y 
a probability Py to be queried). If the oracle (of model A) tells the algorithm “/(x) < f{y)”, 
then A may choose to stay in x with some probability pi and to go to y with probability 
1 — Pi- Similarly, let Pe and Pg be the probability that the algorithm stays in x if the oracle 
responds “/(x) = /(?/)” or “/(x) > f{y)”, respectively. 

We may simulate A in the model A! as follows. We hrst choose the point y with prob¬ 
ability Py as A does. Then we set S to be “=”, “>”} with probability pi ■ Pe ■ Pg, the 

set “=”} with probability pe ■ pe ■ {I — Pg), and so on. (he., for every symbol in S we 

include a corresponding factor p, and for every symbol not in S we include a corresponding 
factor 1 — p). If the answer to our query is yes then we stay at x, and if the answer is no 
then we go to y. Note that the marginal probability that “<” G S' is p£, so the probability to 
stay in x conditioned on f{x) < f{y) is also pi, and similar for “=” and “>”. Hence, by an 
easy case distinction on whether f{x) is less, equal, or larger than f{y), we hnd that in all 
cases the probability of going to y is the same as for the algorithm A. Thus we can simulate 
A in the model A1. 

It remains to prove a lower bound on the ^'-complexity of T. By Yao’s Principle, it 
suffices to prove such a bound for the expected runtime of every deterministic algorithm A G 
A1 on a randomly chosen function. We regard a distribution on J-” where for each 2 ; G {0,1}"' 
exactly one function with optimum z has probability to be drawn, and all other functions 
in T have zero probability. Note that the ^'-oracle gives only two possible answers (one bit 
of information) to each query. By a standard information-theoretic argument [DJW06] we 
show that the probability that the Tth query of A is the optimum is at most 2“”'+*“^. More 
precisely, observe that after i — \ queries we can distinguish at most 2*“^ cases so that on 
average search points are still possible optima. By the choice of our distribution, 

each one of them is equally likely to be the optimum of function /. Let Cj be the number of 
search points that are still possible in the j-th case, for 1 < j < 2*“^. Then the probability 
of hitting the optimum is 

^ c-^ ■ Pr[case j] = ^ 2-" < 2-"+'-^ 

Pr[case j]>0 Pr[case j']>0 

Furthermore, by the union bound the probability that the optimum is among the hst 
i queries is at most 2“”+*“^ < 2*“"^. This immediately implies the statement on the 

Monte Carlo complexity. For the other complexities, the claim follows by observing that the 
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number T of queries to find the optimum has expectation 


n —1 n—1 

E[T] > ^ Pr[T > i] > ^(1 - 2*-") 

n—1 

= n - 2“” ^ 2* = n - (1 - 2"^) > n - 1. 

i=0 

For the (1 + A)-case with A > 1 we consider the following set A' of algorithms, which 
have access to their complete search history. We require the algorithm to partition the set 
of weak orderings of A + 1 elements (i.e., orderings with potentially equal elements) into 
A + 1 subsets S'!,, 5 'a+i, and the oracle tells the algorithm to which subset the ordering 
of the htnesses of the A + 1 search points belongs. Then each (1 + A) memory-restricted 
ranking-based black-box algorithm A can be simulated in this model. More precisely, hx 
A -|- 1 search points |/i,..., y\+i (where yi is the parent individual and 1 / 2 ,..., yx+i are the 
offspring). Then for each weak ordering a of these search points, let Pi(o'),... ,px+i{a) be the 
probability that A selects the hrst, second, ..., A-|- 1-st search point, respectively, if they are 
ordered according to a. Then we can simulate A by choosing a partitioning (S*!,..., Sx+i) 
with probability 

A+l 

p{Si ,..., Sx+i )=n n 

i=l C7&Si 

In this way, for every ordering a of the A -|- 1 search points and for every 1 < i < A -|- 1 the 
marginal probability that a G S'* is Pi{cr). Thus, if the oracle tells us that the ordering of the 
search points is in the i-th partition then we select the search point yt. In this way, we have 
the same probability Pi{a) as A to proceed to search point yi. Hence, we can simulate A in 
this model. 

In order to prove a lower bound for A! we employ Yao’s principle as for the (1+1) case. 
Note that the algorithm learns log(A + 1) bits per query. Similarly as before, the probability 
that the +th query of a deterministic algorithm is the optimum is at most (A +1)*“^2“"’, and 
a similar calculation as before shows that Pr[T < i] < (A + 1)*2“” and E[T] > ?7,/log(A + 
l)-O(l). 

For /i > 1 and A = 1 we learn the position of the new search point among the p previous 
search points. There are at most 2/i + 1 positions for the new search point (its htness may 
equal the htness of one of the other search points, or it may lie between them). Thus we only 
learn at most log(2/i + 1) bits of information per query, and we can derive the complexities 
in the same manner as before. 

If both /i and A are larger than 1, then there are at most ways to select p out of 

p + \ search points, and there are = (1 + o(l))p!(ln2)“^/2 weak orderings on these p 
elements (i.e., orderings with potentially equal elements), where is the p-th ordered Bell 
number [Gro62]. Hence, the algorithm can learn at most b := log((l+o(l))(^)')^)/i!(ln2)“^/2) 

bits per query. Since b = log(('^)')^)) + p{logp — 1 — logln2) — log2 + o(l), this implies the 
claim in the same way as before. □ 
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Note that the lower bounds given by Theorem 6 are by a constant factor stronger than 
the lower bounds for general comparison-based algorithms (that are not memory-restricted), 
if they learn all comparisons among the fi + X search points. For example, in the classical 
case where we may compare exactly two search points (corresponding to the (1+1) case), we 
only get a lower bound of ?7,/log(3) — 0(1) instead of n — 1. Intuitively speaking, the reason 
is that a comparison-based algorithm may use the three possible outcomes “larger”, “less”, 
or “equal” of a comparison, while memory-restricted comparison-based algorithms only get 
two outcomes “stay at x” or “advance to y”. 

We remark that the analysis for /i > 1 can be tightened in several ways. Firstly, for 
the elitist (p + 1) black-box complexity, we only have 2p cases instead of 2/i + 1 since we 
can - sloppily speaking - not distinguish between the case that the new search point is 
discarded because it has worse fitness than the worst of the /i old ones, or whether it is 
discarded because it has equal htness to the worst of the y old search points. Moreover, 
for all black-box models under consideration we learn \og{2y + 1) bits of information in the 
+th round only if all previous search points have different fitnesses; otherwise, we get less 
information. However, if the new search point has fitness equal to one of the old fitnesses, 
then with the next query we get less information. Also for the case y > 1 and A > 1 the 
bound in Theorem 6 can be tightened at the cost of a more technical argument. 

6 The (2+1) Elitist Black-Box Complexity of OneMax 

For the (2 + 1) elitist black-box complexity, a simple algorithm proves to have complexity 
at most n + 1. The algorithm is deterministic, so it provides an upper bound to both the 
Monte Carlo complexity and the Las Vegas complexity. 

Theorem 7. The (Monte Carlo and Las Vegas) (2+1) elitist black-box complexity of One- 
Max is at most n + 1. 

Proof. Throughout the algorithm, we maintain the invariant that in the i-th step we have 
two strings Xi and x[ that are both optimal in the hrst i bits, that are both zero on bits 
i + 2,..., n and that differ on bit i + 1 (one of them is 0, the other is 1). 

We thus start with the all-zero string xq = (0,..., 0) and the string x'q = (1, 0,..., 0). 
Given Xi and x', take the string with the smaller htness (say x'), and hip both the i-th and 
the {i + l)-st bit in it, giving a string x[j^i. (The index i is determined by x* and x'.) Since 
the i-th bit in x' was incorrect, the htness of x'_,_;^ is at least as high as the htness of x' and 
we may thus replace x' by x'_,_;^. The invariant is maintained with Xj+i = Xj, since both Xj+i 
and x[^i are optimal on the +th bit (and on all previous bits by induction). In this way, the 
n-th generation will contain an optimal search string, and at most n + 1 htness evaluations 
are needed in these n generations. □ 
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7 The (1+1) Elitist Black-Box Complexity of OneMax 

We start with a high level overview of the algorithm in Section 7.1. Some tools needed for 
its runtime analysis are presented in Section 7.2, while the formal analysis is carried out in 
Section 7.3. 

7.1 Overview 

While the algorithm and the analysis in Section 6 are rather straightforward, the analysis 
for the (1+1) situation is considerably more difficult. In fact, we do not know the Las Vegas 
black-box complexity of OneMax for (1+1) elitist algorithms. As we shall discuss below, if 
we only had one additional bit that we could manipulate in an arbitrary way, we could show 
that it is of linear order, but we do not know how to create such a bit. Still, the general 
ideas for that algorithm show a linear Monte Carlo black-box complexity. According to the 
lower bound (Theorem 6), this is best possible. 

Theorem 8. The Monte Carlo (1+1) elitist black-box complexity of OneMax is 0(n). 

The lower bound in Theorem 8 follows from Theorem 6. We thus concentrate in the 
following on the upper bound. As in previous works on black-box complexities for OneMax, 
in particular the memory-restricted algorithm from [DW14a], we will use some parts of the 
bit string for storing information about the search history. 

The main idea of the algorithm is similar to the one of the previous section. That is, we 
aim at optimizing one bit at a time. Since we cannot encode any more the current iteration 
in the population, we implement instead a counter which tells us which bit is to be tested 
next. The main difficulty is in (i) designing a counter that does not affect the htness of 
the bit string, and (ii) optimizing a bit with certainty in constant time. As we shall see in 
Section 7.2, a counter can be implemented reserving O(logn) bits of the string exclusively 
for this counter, solving (i). Point (ii) can be solved if we may access a small pool of non- 
optimal bits (which we call trading bits). The key idea is that throughout the algorithm in 
expectation we gain more trading bits than we spend, so we never run out of trading bits. 
The main steps of the algorithm verifying Theorem 8 are thus as follows. 

1. Create a neutral counter for counting numbers from 1 to n. 

2. Create a pool of a;(logn) trading bits, all of which are non-optimal. 

3. Using the trading bits, optimize the remaining string (the part unaffected by the 
counter) by testing one bit after the other. Use the counter to indicate which bit 
to test next. At the same time, try to recover trading bits if possible. 

4. Using RLS (Algorithm 2), optimize the part which had been used as a counter. We 
use bit 6o as a flag bit to indicate that we are in Step 4. 

5. Optimize 6o- 
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The technically most challenging parts are Step 1 and Step 3. But, interestingly enough, the 
key problem in turning the Monte Carlo algorithm into a Las Vegas one lies in separating 
Step 5 from Step 4: we need to test every once in a while during the fourth phase whether 
or not bit is optimal. If we test too early, that is, before Step 4 is finished, it may happen 
that we have to accept this offspring and thus misleadingly assume that we are in one of 
the first three steps, yielding the algorithm to fail. Note though that this problem could be 
completely ignored if we had just one bit that we could manipulate as we want (i.e., without 
having to use elitist operations). 

Due to all the necessary preparation, the formal proof of Theorem 8 will be postponed 
to Section 7.3. 

7.2 Tools for Proving Upper Bounds 

In this section we collect tools that are common in the algorithms of the subsequent sections. 
All the following operations will be Monte Carlo operations, i.e., they have some probability 
of failure. Recall from Remark 1 that if we have an algorithm A for a set of functions 
and a “failure event” £^faii of probability pfaii such that conditioned on -iTfaii the algorithm 
A succeeds after time T with probability at least 1 — (p — Pfaii), then the p-Monte Carlo 
complexity of A on is at most T. In particular, if conditioned on -iTfau the algorithm A 
succeeds after expected time T, then by Markov’s inequality it succeeds after time at most 
(p — Pfaii)“^T with probability at least 1 — (p— Pfaii)- Therefore, the p-Monte Carlo complexity 
of A on is at most 0(T) for any p > pfau with p — pfaii G 14(1). 

7.2.1 Copying or Overwriting Parts of the String 

Our first operation will be a copy operation. If we have a large part B of the string with 
a constant fraction of non-optimal bits, then we can efficiently copy a small substring into 
a new position by flipping some non-optimal bits of B. After the operation, B is still of a 
form that may be used for further copy operations, except that the number of non-optimal 
bits in B has decreased. Note that a string drawn uniformly at random of, say, length nj^ 
may serve as B since with high probability roughly half of the bits in B will be correct. 

Lemma 9. Assume we have a set B of b known bit positions, of which at least bo = fib 
bits are non-optimal, for some fi > t), and the position of the non-optimal bits are uniformly 
at random in B. Assume further that we have two sets C,C' of bit positions such that 
\C\ = \C'\ < ho/2, and that B,C,C' are pairwise disjoint. 

There is a (1+1) elitist black-box strategy that copies the bits from C into C. For any 
c > 0 this algorithm reguires at most c • IC*! • log(n)//3 iterations with probability 1 — 

After the copy operation, at least bo — \C\ bits in B will be non-optimal, and their positions 
will be uniformly at random in B. 

The same strategy can be used to overwrite C with a fixed string (e.g., (!,...,!)). 

Proof. We perform the following operation until C and C are equal. Assume that in the 
current search point x the first i — 1 bits of C and C coincide, but the Ath bits differ, for 
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some i > 0. Sample a new search point x' by flipping the i-th bit of C and a random bit in 
B. Accept x' if /(x') > /(x). Note that we flip exactly one bit in B for every bit in C that 
we copy, so at all times at least bo — \C\ > 6o/2 bits in B are non-optimah 

If the Ath bit of C was non-optimal, we accept x' in any case. Otherwise, we accept it if 
the random bit in B was non-optimal, which happens with probability at least bo/{2b) =13/2. 
Thus we need in expectation at most 2/(3 trials to copy the i-th. bit, proving that the expected 
runtime is at most 0{\C\/13). By the Chernoff bounds, the runtime is more than c|0| log(n)/ 13 
with probability at most Finally, since we choose the bits in B uniformly at random, 

the positions of the non-optimal bits in B are uniformly at random after each step. □ 

7.2.2 Reliable Optimization 

In this section we give a routine that allows to be sure with very high probability that some 
small part of the string is optimal. 

Lemma 10. For every 0 < p < 1/2, p = e~°^'^\ there is £ = 0(log(l/p)) such that the 
following holds for all (3 > 0 and k eN. Let x be a bit string in {0,1}” such that Xi = X 2 = 

... = X£ = 0, and assume that in the remaining string there is a block B of known position 
of size at least 2£/(3 such that at least a (3 fraction of the bits in B are non-optimal, their 
positions distributed uniformly at random in B. Moreover, let C be a block of size k that is 
disjoint of xi,... ,X£ and of B. 

Then there exists a (1+1) elitist black-box strategy that with probability at least 1—p 
optimizes C in time 0{£k\ogk/(3) and marks termination by setting xi to 1. The algorithm 
will optimize at most £ random hits of B by copy operations as in Lemma 9. 

Note that Lemma 10 can be achieved with trivial algorithms (e.g., RLS) if we do not 
insist that the algorithm marks termination. This is an important part since knowing when 
a phase has hnished will be a crucial ingredient for further algorithm. We remark that the 
bits Xi,..., X£ in Lemma 10 may be replaced by any bits as long as the positions are known. 
We remark further that the requirement p = can be replaced by p = VL{e~^'^) for some 

suitably chosen constant c > 0. For our purposes the claimed setting suffices. 

Proof. We will use the number of one-bits among xi,. .. ,xi as an estimator for the time 
that we have already spent. In each step with probability 1 — l/(3/clog/c) we use an RLS 
step (Randomized Local Search, Algorithm 2) on C. Otherwise, we flip the first of the bits 
Xi, ... ,Xi that is still zero, and we flip simultaneously a random bit in B. 

By our assumption made above, at most £ bits of B will be flipped, so during the the whole 
algorithm each one of them is non-optimal with probability at least {I3\B\ — £)/\B\ > (3/2. 
Thus in each step we successfully flip one of the bits x\,...,xi with probability at least 
p' := (3/{2k\ogk). By the Chernoff bound, the probability that after n' = A£k\ogk/(3 steps 
we have not flipped all of them is at most 

Pr[Bin(?7,',p') < £] < < p/3 
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for a suitable choice of £ = 0(log(l/p)). Thus the algorithm terminates after at most n' 
steps with probability 1 — p/3. 

Let us split the execution into I rounds, where the i-th round is characterized by Xi = 
... = Xi-i = 1 and Xi = ... = xg, = Let us call a round which takes at least 2k log k 
RLS steps on C a long round. Since the number of RLS steps in each round is geometrically 
distributed, a round is long with probability at least 


3k log k 


2k log k 


> 


1 

1 - - 
3 


4 

9’ 


since the function (1 — l/xY is monotonically increasing in x > 1. Thus, by the Chernoff 
bound there are at least 2i/9 long rounds with probability at least 1 — > 1 — p/3 

for a suitable choice of i. On the other hand, the probability that C is not optimized 
in a long round is at most 1/k (this is an application of the coupon collector problem, 
see [Doell, Theorem 1.23]). So the probability that C is not optimized by any of r2(£) long 
rounds is at most < p/3 for a suitable choice of i. Summarizing, with probability 

1 — p, the algorithm succeeds in time at most 0{£k\ogk/(3). □ 


7.2.3 A Neutral Counter 

Next we show that it is possible to set up a counter in a way that increasing the counter 
does not affect the OneMax values of the string. The counter can be implemented in the 
(1+1) elitist black-box model, and is hence applicable in any (/i + A) elitist black-box model. 

Lemma 11 (Neutral Counter). For every 0 < p < 1/2, p = e~°^'^\ there is i = 0(log(l/p)) 
such that the following holds. Let x be a bit string in {0, 1 }"" such that Xi = X 2 = ... = 
xi -^-2 = 0 and (x^+s,... ,Xn) is uniformly distributed in { 0 , 

Then there exists a (1+1) elitist black-box strategy that with probability at least 1 — p 
implements in x a counter which can be used during future iterations without changing the 
OneMax value of the string. For counting up to j = 0{n), the counter reguires a total 
number of O(log j) bits that are blocked in all iterations in which the counter is active. The 
setup of the counter reguires 0(l\ogi\og\ogi) function evaluations. During the setup of 
the counter, 0(\ogj + £) random bits of the remainder of the string are optimized by copy 
operations as in Lemma 9. 

Proof. In all that follows we use a partition C, C, and B of [n] \ [£ + 2], thus splitting the 
string X (minus the hrst £ + 2 bits) into three parts, which by abuse of notation we also call 
C, C", and B. The sizes of C and C are k = O(logn) each (see below), so the size of B is 
n — o(n). By assumption, the entries in B are initialized uniformly at random. Note that by 
the Chernoff bound, with exponentially high probability at least a 1/3 fraction of the bits 
in B are non-optimal. We will henceforth assume that this is the case (giving the algorithm 
a failure probability of ^ p/2.). As we will see, the counter algorithm will use at most 

2\C\+ £ + 2 “payoff bits” which are flipped from a non-optimal into the correct state, so that 
at any point time during the algorithm at least a 1/3 — o(l) fraction of the bits in B will be 
non-optimal. 
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Let k be the smallest even integer such that (^^ 2 ) — 3- Note that k = O(logj), since 
(fc/ 2 ) — t)y Stirling’s formula. We use Lemma 10 to optimize block C with probability 

1 — p 12 , using X 3 ,, xe +2 as flag bits. 

Once we have a string in the memory which satishes xi = X 2 = 0 and xi +2 = 1, we 
assume that part C is optimized and we copy the entries of C into part C' using Lemma 9 
with part B as payoff bits. As soon as C has been copied into C, we want to change the 
second flag bit. We do this by flipping X 2 plus a random bit in B until the corresponding 
string is accepted. The flag Oil in the first three positions tells us to move on to initializing 
the counter. 

We fix an enumeration of all the (^^ 2 ) possible ways to set exactly fc/2 out of the k entries 
to their correct values. Let ri,..., be the first j strings corresponding to this enumeration. 
For initializing the counter to one we copy the string ri into C by applying Lemma 9, again 
with part B as payoff bits. When we have initialized the counter, we finally flip the flag bit 
Xi (together with a random bit in i?) to indicate that the counter is ready. 

Note that throughout the whole algorithm, we use at most 2|C| + .^ + 2 payoff bits, as 
we claimed at the beginning of the proof. Note also that if at least n/3 of the bits in B are 
non-optimal, the second and the third phase are Las Vegas operations (they can never fail, 
but the time needed for these phases is random). 

Since the optimal entries of C are stored in C (the bits in C will not be touched as long 
as the counter is active), we can at any time read the value of the counter by comparing 
C with C. Similarly, if we want to increase the counter from some value z to z + 1, we 
flip simultaneously those bits of C in which r* and rj+i differ. Since there are exactly /c/2 
ones in either of the two strings r* and rj+i, this does not affect the OneMax- value of the 
string. □ 

7.2.4 Optimizing in Linear Time with Non-Optimal Bits 

The following lemma allows us to optimize a large part of the string in linear time, provided 
that we have some small area B' with “trading bits”, i.e., with bits that are non-optimal. 

Lemma 12. Let 0 < a < 1 be constant. Assume we have two counters C, C that can count 
up to n and a flag bit b that is set to 0. Assume further that we have two blocks B, B', with 
\B\, \B'\ = ci;(logn) such that all bits in B' are non-optimal, and that at least an a fraction 
of the bits in B is non-optimal, their positions distributed uniformly at random. Then there 
is a (1-hi) elitist black-box algorithm that optimizes B and B' in linear time with probability 
1 — o(l/n). 

Proof. We start with the counters C and C at 0, and go through the bits in B one by 
one, maintaining the following invariants. When C* is at i then the first i bits in B are 
optimal. When C is at i' then the first i' bits in B' are optimal, and all further bits in B' 
are non-optimal. We will call the non-optimal bits in B' trading bits. 

Choose 0 < p < 1 so small that 2p/(l — p) — a < 0. Assume C is at position i and C at 
position i'. If F = 0 then we simply flip the first bit in B' and increase C', so assume i' > 0 
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from now on. In each step we flip a coin. If it turns head (with probability 1 — p), then we 
flip the i + 1-st bit in B, increase C, flip the i'-th bit of B' and decrease C. If the offspring 
has equal htness, we accept it. Note that the htness is equal if and only if the bit in B was 
non-optimal, and that we recover one of the non-optimal trading bits in B' in this case. On 
the other hand, if the bit in B was optimal in the original string then the htness of the new 
search point is strictly smaller than the previous one so that the offspring is immediately 
discarded. So we only accept an increase in C if the i-th of B is correct in the new string. 
If the coin hip was tail (with probability p) then we just hip the i + 1-st bit in B, hip the 
i' + 1-st bit in i?', and increase C (but do not touch C). Note that we may (and will) accept 
the ohspring in any case. 

Evidently, we maintain the invariant mentioned above. Moreover, we spend only an 
expected constant number of iterations for optimizing a bit in B, and by the Chernoh bound 
the algorithm optimizes B in at most c\B\ with probability 1 — = 1 — o(l/n), for a 

suitable c > 0. Once B is optimized (i.e., once the counter C is at position \B\), we hip all 
non-optimal bits in B' in one step. The only way the algorithm can fail (except by taking 
too long, which only happens with probability 1 — o(l/n)) is by running out of trading bits, 
so it remains to show that with high probability this does not happen. 

Let Xi the number of trading bits that are used up after the Tth round, and let Aj := 
Aj+i — Xi be the number of trading bits that we spend in this round. For the sake of 
exposition, assume hrst that there is an unlimited number of trading bits that can be gained 
or used in this round (while in fact, the total number of trading bits must stay between 0 
and B'). If the i-th bit of B was optimal then the algorithm waits for tails to proceed. This 
costs us one trading bit and brings us into the position that the i-th bit of B is non-optimal. 
In that position, we either (with probability I — p) proceed to the i + 1-st bit and gain a 
trading bit, or we proceed (with probability p) to the other position, pay a trading bit, and 
pay another one to return to the old position. So if the Tth bit is initially non-optimal, then 
the expected number of trading bits that we spend for optimizing the i-th bit is 

E[Aj I i-th bit non-optimal] = —(1 — p) + p{2 + E[Aj | i-th bit non-optimal]), 

from which we easily deduce E[Aj | Tth bit non-optimal] = —1 -|- 2p/(l — p) and E[Aj | 
Tth bit optimal] = 1 -|- E[Aj | Tth bit non-optimal] = 2p/(l — p). The probability that the 
i-th bit is non-optimal is at least a, and so 

E|A.]<a(-l + ^)+(l-a)^ = ^-a<0. 

\ I — Pj 1 — P 1 — p 

Now we examine how the drift changes if the number Xi of trading bits that we can gain 
in the i-th round is bounded. Since the probability to spend more than b trading bits in 
one round goes (geometrically) to zero as 6 —)■ oo, there is a constant &o > 0 and a constant 
e > 0 such that 

E[Ai I Xi > 6o] < 

Therefore, the number Xi of used trading bits performs a random walk with negative drift 
while it is between and \B'\. Moreover, the probability Pr[|Aj| > j] decreases geometrically 
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in j. Therefore, by the Negative Drift Theorem 5 (with constant r(f) and ^ = \B'\ — 1 — 6o = 
Ci;(logn)) the probability that any of the Xi exceeds \B'\ — 1 for 1 < i < |5| is at most 
g-r 2 (|B |-i-6o) = o(l/n), so we are not out of trading bits after any round. But in every 
round we gain at most one trading bit, so if Xi does not exceed \B'\ — 1 then at no point 
during the Tth round the algorithm uses more than \B'\ trading bits. This proves that we 
never run out of trading bits with probability 1 — o(l/?7,). □ 

We remark without proof that Lemma 12 can be strengthened to hold with probability 
1 — p for any p = if |i?|, \B'\ = a;(log(l /p)). 

7.3 Proof of Theorem 8 

Proof. We split the string into four parts: firstly a constant number of flag bits indicating 
in which phase of the algorithm we are. Some of them we use for the subroutines, but bit Bq 
is kept to be 0 until the very last phase. Then two counters C, C' that can count up to n. 
Further, we have a part B' of the string of size O(log^n) which we use as trading bits, and 
the remaining part B. 

Now we put all pieces together. We initialize the flag bits as 0, and initialize B uniformly 
at random. Then we build the counters as described in Lemma 11, using the randomness 
from B, and indicate with a flag bit when we are finished. We split B' into two parts B[ 
and i ?2 of equal size. Then we use Lemma 10 to optimize B[ with high probability, setting 
a flag bit when hnished. When this flag is set, we copy B[ into i? 2 , and then we copy the 
string i ?2 © (1,..., 1) into B[, effectively inverting all bits in B[. For both copy operations 
we use the randomness from B. Note that afterwards we still have an 1/2 — e fraction of 
non-optimal bits in B (using Chernoff bounds and the fact that all copy operations together 
touch o{n) bits), and that all bits in B[ are non-optimal. Thus we can apply Lemma 12 to 
optimize B,B[ and i ?2 in linear time with probability 1 — o(l/n). In the last step of this 
phase, we also set xq to 1. (We can do this since the last operation flips all the non-optimal 
bits in B[). 

While Xo is 1, we do the following. With probability 1 — Inn/n we flip a bit outside of 
B U {xo}. With probability Inn/u we flip Xq. Note that there are at most O(log^n) bits 
outside of B, so this region (except of Xq) will be optimized after an expected number of 
O(log^nloglogn) steps. Moreover, by the Coupon Collector Theorem the probability that it 
takes more than c log^ n log log n steps to optimize B is at most 1/n for at suitable constant 
c > 1 [Doell, Theorem 1.23]. Hence, when xq is flipped, then with probability at least 
1 — log^n/(?7,loglogn) we have found the optimum. On the other hand, with probability 
1 — (1 — Inn/n)” > 1 — 1/u this phase takes at most n steps. This concludes the proof. □ 

Remark 13. In the proof of Theorem 15 the main part is Las Vegas. We have failure 
probabilities only for initializing the counter, by running out of trading bits, and for the opti¬ 
mization of the bits which are reserved for the counter. The first two failure probabilities can 
he made superpolynomially (in fact, exponentially) small. The failure probability stemming 
from the last phase can he decreased by using an iterated counter, which is used to reduce 
the number of bits that are blocked for the operation of the counter. 
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More precisely, we start with the counter described above, which can count from one 
to ji = j- A second counter is implemented, again using Lemma 11, to count from 1 to 
j 2 = ©(log ji), a third one for counting from one to = ©(^ 2 ); ond so on until the size of 
the bits that need to be blocked for the counter is at most constant. Then we optimize the 
main part of the string as before, but making sure that ji + J 2 + • • • = O(log^n) bits remain 
that are all non-optimal. With these bits, we can optimize the region of the first counter 
without error probability, using the second counter and ji of the non-optimal bits. Then we 
optimize the region of the second counter using the third counter, and so on until we end up 
with a counter that has only constant size. This counter we then optimize with RLS steps as 
described in the proof. Effectively, this allows us to design an algorithm (by flipping the last 
bit with probability \nn/n) that needs time 0{n) with probability 1 — 0(logn/?7,). 

Alternatively, although it gives neither a Monte Carlo nor a Las Vegas complexity, note 
that there is an algorithm (by flipping the last bit with probability Ifn) for which there is an 
event Shad of Probability Pr[£fead] = 0{l/n) (namely, the event that either initialization fails, 
or that the last bit is flipped too early) such that conditioned on -^Sbad the algorithm has an 
expected runtime ofO{n). 

Remark 14. Note that if we had only one bit of additional memory, then we could use it as 
an indicator bit for random local search: in any step of the algorithm, we could with some 
small probability (e.g., with probability l/{n\ogn)) flip this bit, and then proceed with random 
single bit flips from this point on. If the success probability of the Monte Carlo algorithm is 
at least 1 — 0{1/ \ogn) (we proved much stronger bounds), then this results in a Las Vegas 
algorithm with linear expected runtime. Unfortunately, it is unclear how to make use of high 
success probabilities without an additional bit of memory, so our results do not imply a linear 
Las Vegas runtime. 

8 The (1+A) Elitist Black-Box Complexities of OneMax 

We have already seen in Section 6 that a slight increase of the population size of the elitist 
black-box model can signihcantly simplify the OneMax problem. In the (2-1-1) model 
considered in Section 6 we were in the comfortable situation that we could use the two 
strings of the memory to encode an iteration counter. In this section we regard the (1 -|- A) 
elitist black-box model. Intuitively, this model is less powerful than the (A -|- 1) model since 
we have to base our sampling strategies solely on the one search point in the memory. Still 
the model allows to check and compare several alternatives at the same time, so it should 
be considerably easier than the (1+1) situation. The core idea of the following theorem is 
to divide the bit string into blocks of size log A each and to optimize these blocks iteratively 
by exhaustive search. 

Theorem 15. Let e,C > 0, and let 1 < A < 2"'^ \ For suitable p = 

0(log^ u log logu log A/u) there exists a (1 + A) p-Monte Carlo elitist black-box algorithm 
that needs at most 0{n/\og\) generations on OneMax. 
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We emphasize that the bound in Theorem 15 is in number of generations, not in terms 
of function evaluations. We feel that this is the more useful measure, in particular when the 
A offspring can be generated in parallel. Note that an algorithm optimizing for the number 
of function evaluations can be substantially different from the ones minimizing the number 
of generations. 

Proof of Theorem 15. We initialize the algorithm by implementing the (1+1) elitist counter 
as described in Lemma 11. We split the remaining string into blocks of length at most 
[log 2 AJ > 1, and we want to optimize each block with exhaustive search. There are j = 
[n/Llog 2 AJ] = r2(n^) such blocks, and we thus apply Lemma 11 with this j. This requires 
0(log(l/p) log j(loglog j)^) = O(log^n) generations. The counter blocks O(logj) bits which 
we cannot touch during the optimization of the blocks (except, of course, for operating the 
counter). 

We then optimize the n — 0(logj) bits which are not blocked for the counter. We optimize 
Llog 2 AJ bits in each iteration, by sampling all possible 2 L*°§ 2 N < y entries in the block. In 
each sampled offspring the counter is increased by one (when compared with the counter of 
the parent individual). In the last generation we possibly optimize a block that is smaller 
than [log 2 AJ, but the routine is the same, i.e., exhaustive search. Note that this optimization 
routine is deterministic. It requires at most j generations. 

Once the counter of the parent individual shows j we need to optimize those bits that 
were reserved for the counter. We do this in the same way as we did in the (1+1) situation 
(see Section 7.3). That is, we use Randomized Local Search (RLS, Algorithm 2) on the yet 
unoptimized part and with some probability p' = 0(logn log A/n) we flip the bit Bq indicating 
us to do RLS steps. At the time that bit Bq is flipped for the first time, the remainder of the 
bit string is optimized with probability at least 1 — p/2 for a suitable choice of p', and the 
probability that it needs more than Cn/logA steps is 0 {l/n) = o(p) for a suitable choice of 
C > 0. □ 

We remark without formal proof that the requirement on p can be relaxed by regarding 
an iterated counter (cf. Remark 13). If A is a small constant, then we may use p = 0{\ogn/n) 
as in the (1+1) case. On the other hand, if A is a sufficiently large constant, then we can 
optimize the constantly many bits of the last counter and Bq simultaneously in just one step. 
In this case, we may even use p = i.e., for all such p there are (1 + A) p-Monte 

Carlo black-box algorithms using only 0{n/\ogX) generations. Despite these small failure 
probabilities it is still not clear how to derive an upper bound on the corresponding Las 
Vegas complexities. 

9 The (/i+1) Elitist Black-Box Complexities of OneMax 

for /i > 2 

As mentioned earlier, the (p + 1) model is quite powerful as it allows to store information 
about the search history. We shall use this space to implement a variant of the random 
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sampling optimization strategy of Erdos and Renyi [ER63] (see Section 3). To apply this 
random sampling strategy in our setting, we need to make this approach satisfy the ranking¬ 
basedness condition, the memory-restriction, and the elitist selection requirement. Luckily, 
the hrst two problems have been solved in previous works, though not for both restrictions 
simultaneously (see Section 3). 

In the elitist model we do not obtain absolute htness values but merely learn the ranking 
of the search points induced by the htness function. It has been shown in [DW14b] that the 
ranking-restriction does not change the complexity of the random sampling strategy by more 
than an at most (small) constant factor. That is, there exists a function t{n) = 0{n/logn) 
such that for n large enough the ranking of a sequence si,..., spn) of random strings in {0,1}” 
induced by the ONEMAX-function uniquely determines the target string with probability at 
least 1 — 0{■^/nexp{—A^/n/\ogn)), where A is some positive constant. 

By the restricted memory we may not be able to store all t{n) search points. But, 
following previous work (see for example [DW14c] for a description of this method invented 
in [DJK+11]), we can split the string into smaller blocks of size m each such that t{m) < p. 
We then optimize these n/t{m) blocks iteratively. Note that this is different from the strategy 
in Section 8, where all 2* possible entries for a block of size t are sampled. 

The last challenge that we need to handle is the elitist selection. Intuitively, if we replace 
after the i-th phase (in which we sampled the required search points for optimizing the 
i-th block) the entries in the i-th block by the optimal ones, this should give us enough 
flexibility (in terms of htness increase) to replace the entries in the {i -|- l)-st block by the 
random samples si,... ,St needed to determine the optimal entries of the {i -|- l)-st block. 
The theorem below shows that this is indeed possible, with high probability. 

Theorem 16. For constant n, the {fi + 1) (Monte Carlo and Las Vegas) elitist black-box 
complexity of OneMax is Q{n). 

For pi = a;(log^n/loglogn) n 0{n/logn)the (p -|- 1) Monte Carlo elitist black-box com¬ 
plexity of OneMax is 0(n/logp). 

There exists a constant O > 1 such that for p > Cn/\ogn, the (/r -|- 1) (Monte Carlo and 
Las Vegas) elitist black-box complexity is 0(n/logn). 

Proof. The lower bounds follow from Theorem 6. For constant pi the upper bound follows 
the (2-1-1) elitist algorithm in Theorem 7. 

The result for pi > Cn/\ogn follows from the result on the ranking-based black-box 
complexity in [DW14b]. For the Las Vegas result recall that, as commented in [DW14c, 
Section 3.2], the random sampling technique of Erdos and Renyi can be derandomized; that 
is, there exists a function t{n) = 0 {n/\ogn) and sequences Si,..., spn) ^ {0,1}"' such that 
the htness values of these samples uniquely determine the target string of the OneMax 
function. This, together with the ranking-based strategy of [DW14b] implies the upper 
bound in the third statement. For the lower bound, a simple information-theoretic argument 
shows that if the target string is uniformly at random, then with high probability nf (2 logn) 
samples are not enough to hnd the optimum [ER63]. 

To prove the statement for intermediate values of pi, note that it suffices to show the 
case pi = a;(log^n/loglogn) fl 0{n/\og^ n). The case pi' = uj{n/\o^ n) n 0{n/\ogn) follows 
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from the case /r = n/log^n since the (/i' + l)-complexities can only be smaller than the 
corresponding (/i + l)-complexities, and 0(n/log/i) = 0(n/logp') = 0(n/logn). 

So we may assume that p = a;(log^n/loglogn) fl 0(n/log^n). Let k = 0(plog/i) = 
a;(log^n) such that for some t < /i the ranking of a random sequence si,... ,St G {0,1}^ 
induced by the OneMax values of an arbitrary OneMax function Om^ determines the 
target string 2 ; with probability at least 1 — (5\/fcexp(—A\/fc/log/c), 5 and A being the 
constants implicit in the result of [DW14b]. 

Setting up the counter: The algorithm starts by building a neutral counter (a counter 
as in Lemma 11) for counting values from one to [n/fc]. As in previous proofs we denote 
the counter by C. Its length is O(logn). 

We initialize the algorithm by sampling the string with all zeros in the first IC*! + 3 
positions and random entries in the remaining positions. We place C in the positions 
{4,..., ICI + 3}, and the optimal entries of C will be copied into part C", which is placed in 
positions {|C| + 4,..., 2|C| + 3}. First we use the (2 + 1) linear optimization strategy from 
Theorem 7 to optimize part C. This requires 0{\C\) = O(logn) (deterministic) iterations. 
At the end of this phase we set the first bit to one, indicating that we are now ready to copy 
C into C. We do so by applying the strategy from Lemma 9 with 5 := {2|C| + 4,..., n}. 

When C is copied into C we flip the second flag bit and continue by initializing the 
counter. This requires to flip \C\/2 bits from the correct into their non-optimal state. Again 
we apply Lemma 9 with B as above. Note that by Chernoff’s bound, B satishes the require¬ 
ments of Lemma 9 with high probability. 

By comparison of C with C we recognize when the counter is initialized. We are then 
ready to enter the main part of the algorithm in which we optimize part B. Note that at 
this point the first three bits are 110. Note further that at most O(logn) bits in B have been 
touched at this point, so that, as also commented in the proof of Theorem 8 in Section 7.3, 
by Chernoff’s bound, with probability at least 1 — exp(—e^n/3), after this copy operation 
at least a 1/2 — e fraction of B is non-optimized, for any constant e > 0. The —e^n/3 part 
(en lieu of the typical —e^n/2 expression) in this bound accounts for the fact that O(logn) 
bits have been optimized during the implementation and initialization of the counter and 
the fact that we regard the substring B of size only n — O(logn). 

Optiuiizatiou of the Maiu Part Usiug Raudoru Sarupliug: We divide part B 
into blocks of length k each; only the last block, which will be treated differently, may have 
smaller size.We aim at optimizing the blocks iteratively. 

To this end, we first show that with high probability we can for each of the [|i?|//c] blocks 
determine the target entries in the block from the t random samples. Recall that for each 
block individually this probability is 1—S\/k exp{—Ay/k/ log k). By a union bound, the prob¬ 
ability that it works for all blocks is thus at least 1 — 6{n/'/k) exp(—a;(\/log^ n)) = 

1 — 0{n~‘^) for any positive constant c. 

Fix 0 < e,e' < 1/6. We show next that with high probability the fitness contribution 
of each block (except for, potentially, the last one, which we can and do ignore in the 
following) is between {1/2 —e)k and {l/2 + e)k initially. After initialization of the algorithm. 
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the expected fitness contribution of each block is 1/2 times the length of the block, i.e., k/2. 
During the setup and the initialization of the counter, we have changed at most O(logn) 
bits in B, their positions being uniformly distributed in B. Therefore, each block has an 
expected fitness contribution after the setup and initialization of the counter of (1 — o(l))/c/2. 
By Chernoff’s bound, its contribution is between the desired (1/2 — e)k and (1/2 + e)k with 
probability at least 1 — 2exp{—6k) for some positive constant 6. By a union bound, the 
fitness contribution of every (but potentially the last) block is thus between (1/2 — e)k and 
(1/2 + e)k with probability at least 1 — 2(n//c) exp(—(5fc). Together with the requirement 
k = Ci;(log^ n) this shows that the failure probability is at most n~^ for any positive constant 
c. We may therefore condition all the following statements on this event. 

By the same reasoning as above, the probability that for all blocks i and for all j G [t] the 
fitness contribution of the random string sj in block i is between {1/2 — e')k and {1/2 + e')k 
is at least 1 — (n/log k) exp{—6'k), for some positive constant 6'. As above, this expression is 
at least 1 — n~^ for any positive constant c. We may therefore also condition on this event. 

Let us assume that for some block 1 < i < [|i?|//c] —1 we have sampled the required 
t random strings. We show how to optimize block i + 1. (The optimization of the first 
and the last block needs to be handled differently and will be considered below.) That is, 
the entries of the first i — 1 blocks are already optimized, the counter of the jj, strings in 
the population is set to i and the entries in the ith block of t of these strings are taken 
from {0,1}^ uniformly at random.^ The next t queries are as follows. In each query, we 
replace the entries in the i-th block by the optimal ones, we increase the neutral counter 
by one, and we replace the entries in the {i + l)-st block by entries that are taken from 
{0,1}^ independently and uniformly at random. Let us hrst argue that these queries will 
be accepted into the population. When we replace the initial entries in block z + 1 by the 
random string Sj we lose a fitness contribution of at most {e + e')k. On the other hand, we 
have a fitness increase of at least {1/2 — e')k > {e + e')k from replacing the random entries 
in the zth block by the optimal ones. The neutral counter does not have any effect on the 
htness and can thus be ignored. 

It remains to describe how to optimize the first and the last block. For the last block, 
we simply use the (2+1) linear elitist optimization strategy of Theorem 7. Since the size of 
this block is at most k = 0(/ilog+) = 0(n/log+), this does not affect the overall runtime 
by more than a constant factor. Of course, we increase in each query for the last block the 
neutral counter by one and we replace the random strings in the penultimate block by the 
optimal ones. 

Getting the desired random samples into the first block is a bit more challenging. We 
need to respect the elitist selection rule and need thus to make sure that the random samples 
in the first block are accepted. A simple trick enables us to guarantee that. We first optimize 
the first block with the linear (2+1) elitist strategy from Theorem 7 (note again that the 

Dn more precision, one substring is the median query required for the ranking-based algorithm 
from [DW14b]. See Lemma 12 in [DW14b] for the details of this query, which is needed to verify that 
the fitness level k/2 is correctly identified. It is only important for us to know that we need to make one 
additional non-random query, the fitness contribution of which is |"fc/2] with very high probability. We ignore 
this query in this presentation, as it is obvious that it does not create any problems with our approach. 
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size of the block is k = 0{n/logfi)). We then invert all the bits in the hrst block by 
applying the strategy from Lemma 9 to the hrst block and part B. Since this affects at 
most k = 0(n/log/i) = o{n) bits in 5, all the probabilistic statements that we made about 
B above still hold. At this point the htness contribution of the hrst block is zero and all 
random samples in the hrst block can thus be accepted. 

There are [|i?|//c] = 0{n/k) blocks in total. For each block we sample t = 0{k/ logk) 
random strings to determine the optimum. The overall number of samples performed in this 
phase of the algorithm is thus 0{n/logk) = 0{n/logfi). 

Optimizing the Counter: Once all the blocks have been optimized, that is, as soon 
as St has been sampled in the last block, we sample the search point which replaces St by 
the optimal entries for this block and which has the third bit set to one (it has been zero 
in all previous iterations). This indicates that we can now go to the next phase, in which 
we optimize all the bits in positions {1, 2} U {4,..., 2|C| + 3} using the linear (2+1) elitist 
strategy from Theorem 7. Finally, we check if replacing the third bit by a zero improves the 
htness further. This last phase is deterministic and requires 0(101) = O(logn) queries. □ 

10 Remark on (/i, A) Elitist Black-Box Complexities 

It is interesting to note that it can be signihcantly easier in the elitist black-box model 
to optimize a function when allowed to use so-called comma strategies instead of the plus 
strategies described by Algorithm 1. To make things formal, we call an algorithm that follows 
the scheme of Algorithm 1 with Line 8 replaced by 

Set X^{|/W,..., 

and Line 9 running only to A — p a (p. A) elitist algorithm. That is, a (p. A) elitist algorithm 
has to keep in each iteration the fi best sampled offspring, but it is allowed (and forced) to 
ignore the parent solutions (which, consequently, can be of better htness). As mentioned 
in the introduction, the term elitist selection may not be appropriate here, depending on 
the context, and truncation selection may be the preferable expression. In any case, if the 
algorithm wants to maintain parts of the parental population, it can simply resample those 
individuals that should be kept. 

Note in particular that {fi, A) elitist algorithms can do restarts. Therefore, as discussed 
in Section 2.1, to bound the Las Vegas complexity of {fi, X) elitist algorithms, it suffices to 
bound its Monte Carlo complexity. Note further that for all A' with p + A' < A we can 
imitate every (/i + A') elitist black-box algorithm by a {fi. A) elitist black-box algorithm, from 
Theorems 8 and 15 we get the following corollary. 

Corollary 17. The (1,2) (Las Vegas and Monte Carlo) elitist black-box complexity of One- 
Max is at most 0{n). 

For any A > 2, there are (1,A) (Las Vegas and Monte Carlo) elitist black-box algorithms 
that need at most 0{n/logX) generations on OneMax. 


Asymptotically, these bounds are tight, since matching lower bounds can be obtained by 
the same information-theoretic arguments as used in Theorem 6. We can easily improve the 
bounds in Corollary 17 as follows. 

Theorem 18. The (1,2) Las Vegas elitist black-box complexity of OneMax is at most 
2n + 1, and the correspnding algorithm needs at most n -|- 1 generations. 

For any A > 2 there are (1,A) Las Vegas and Monte Carlo elitist black-box algorithms 
that need at most r’^/Llog 2 generations on OneMax. 

Proof. We hrst regard the (1,2) situation. Initialize the algorithm with the string xi = 
(1, 0,..., 0). We maintain the following invariant: at the beginning of iteration t the string 
Xt in the memory has entry 1 in position t and zeros in all positions i >t. 

We sample in iteration t the two search points Xt © e^+i and x* © Cf © Ct+i, where for all 
j G [n] the string Cj is the string with all entries except the j-th one set to zero. That is, we 
either flip only the t + 1-st bit in xt or we flip both the t-th and the t + 1-st bit. Since the 
two offspring differ in exactly one position, one of them has strictly better htness than the 
other, and we (necessarily) keep the better one. At the end of the t-th iteration the search 
point in the memory is thus optimized in the hrst t positions. After the n-th iteration, the 
optimum is found. 

For the (1, A) situation we simply apply the previous idea with an exhaustive search on 
blocks of length i := Llog 2 AJ. That is, we always move the one by i positions to the right 
while at the same time testing all possible 2^ < A possible entries in these i positions. 

Both algorithms are deterministic and therefore Las Vegas. □ 

For completeness, we note that the (1,1) (Las Vegas or Monte Carlo) complexity of 
OneMax is 0(2"). The Las Vegas upper bound is given by random sampling, and it implies 
the Monte Carlo upper bound as discussed in Section 2.1. For the lower bound, note that the 
algorithm does not get any information about the search point it stores, except whether it is 
the optimum. Therefore, the problem is at least as hard as the needle-in-haystack problem 
Needle where all search point except the optimum have the same htness. Even if we give 
the algorithm access to inhnite memory, for any 0 < c < 1 after c2" steps the optimum of 
Needle will not be found with probability at least 1 — c, proving the lower bounds. 


11 Conclusions 

We have analyzed black-box complexities of OneMax with respect to (/i + A) memory- 
restricted ranking-based algorithms. Moreover, we have shown that the complexities do not 
change if we also require the algorithms to be elitist, provided that we regard Monte-Carlo 
complexities. For different settings of /i and A we have seen that such algorithms can be 
fairly efficient and attaining the information-theoretic lower bounds. 

An interesting open question arising from our work is a tight bound for the Las Vegas 
complexity of OneMax in the (1+1) elitist black-box model. We have sketched in Section 7 
the main difficulties in turning our Monte Carlo algorithm into a Las Vegas heuristic. The 
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possible discrepancy between these two notions also raises the question which problems can 
be optimized substantially more efficiently with restarts than without, an aspect for which 
some initial findings can be found in the literature, e.g., [Jan02], but for which no strong 
characterization exists. 
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